class: title-slide, left, bottom <img src="data:image/png;base64,#img/session04/ggplot2_masterpiece.PNG" width="40%"/> # Introduction to ggplot2 ---- ## **Session 4** ### ### .right-column[ ] .footnote[Artwork by @allison_horst] ??? Check everyone back Will make a start on Session 4, which is a big one, and covers how to make graphs in R using ggplot2 --- # Acknowledgement This session shadows [Chapter 3](https://r4ds.had.co.nz/data-visualisation.html) of the excellent: <img class="center" src="data:image/png;base64,#img/session04/r-for-data-science.PNG" width="40%"/> ??? This syllabus was lovingly inspired and not at all shamefacedly thieved from Chapter 3 of the R for Data Science book, which is free online. So if you want more detail on any of this, would recommend checking out R4DS. --- # ggplot2 Is one of several plotting systems in R <img class="center" src="data:image/png;base64,#img/session04/tweet-poll.PNG" width="40%"/> {plotly} is used by [Public Health Scotland](https://github.com/Public-Health-Scotland/scotpho-plotly-charts/blob/master/plotly_chart_functions.R) ??? ggplot is one of many plotting packages in R, but as this highly scientific twitter poll suggests, it is the most popular. Some other packages are used, for example plotly seems to be used by Public Health Scotland, but mostly ggplot is the most common package --- # Why is ggplot popular? </br> </br> 1. Well designed and supported </br> </br> 2. Highly versatile </br> </br> 3. Attractive graphics (with a little work) .footnote[1. Why start with ggplot : http://varianceexplained.org/r/teach_ggplot2_to_beginners/ 2. Argument against: https://simplystatistics.org/2016/02/11/why-i-dont-use-ggplot2/] ??? Why is that? Well, it's been around for a while, and is very well supported, with lots of features. It is capable of a lot of customisation, so you can do a really incredible amount of different sorts of visualisation. And you can, if you learn a bit about the various options, produce really attractive graphics --- [</br> <img class="center" src="data:image/png;base64,#img/session04/bbc-plots.PNG"/>](https://bbc.github.io/rcookbook/#left-alignright-align_text) ??? The bbc uses ggplot2 graphics - showing just how flexible and good looking these graphics can be made. --- # ggplot2 </br> </br> ggplot2 is part of the tidyverse. </br> </br> So, at the top of your script type: ```r library(tidyverse) ``` ??? Lets get started with some code - ggplot is part of the tidyverse, so to load it in, at the top of your script you want to write library(tidyverse) --- class: inverse, middle, center ## Project 1: </br> </br> Let’s explore a perennial </br> </br> challenge for the NHS: ??? And as a worked example.... --- # Pressures in A&E </br> <img class="center" src="data:image/png;base64,#img/session04/demand-and-capacity.PNG"/> .footnote[ 1. [Picture 1](https://www.deviantart.com/luckymarine577/art/Animated-Ambulance-338970039) by Unknown Author is licensed under [CC BY SA NC](https://creativecommons.org/licenses/by-nc-sa/3.0/) 2. [Picture 2](https://creativecommons.org/licenses/by-nc-sa/3.0/) by Unknown Author is licensed under [CC BY SA NC](https://creativecommons.org/licenses/by-nc-sa/3.0/) ] ??? we are going to look at A&E pressures - how A&E capacity deals with demand --- # Data: Capacity in A&E </br> The dataset we loaded earlier, `capacity_ae`, shows </br> </br> changes in the capacity of A&E departments from </br> </br> 2017 to 2018 .footnote[Closely based on datasets collected by the NHS Benchmarking Network] -- </br> </br> The object named .green[capacity_ae] is a data frame ??? So, we've already loaded a dataset in the last session, capacity_ae, showing changes in capacity of A&E departments from 2017 to 2018 The dataset is stored as something called a data frame. So you might be asking what a data frame is --- ## What is a data frame? A data frame stores tabular data: <img class="center" src="data:image/png;base64,#img/session04/tidydata_1.JPG" width="90%"/> .footnote[Artwork by @allison_horst] ??? You can think of a dataframe as a big table, essentially. Ideally the table should have a 'tidy' format, in which each variable has a column, each observation a row, and each cell a measurement of the variable at that observation. --- ## tibble = data frame In the tidyverse you may see the term "tibble" </br> </br> We’ll take "tibble" to be synonymous with "data frame" </br> >A tibble... is a modern reimagining of the data.frame... > >Tibbles are data.frames that are .blue[**lazy and surly**]: they do less (i.e. they don’t change variable names or types, and don’t do partial matching) and .blue[**complain more**] (e.g. when a variable does not exist). > > This forces you to confront problems earlier, typically leading to cleaner, more expressive code. .footnote[emphasis added to [quote](https://tibble.tidyverse.org/)] ??? Something worth mentioning is that in the tidyverse dataframes are reimagined as 'tibbles'. A tibble is slightly different to a dataframe, but for our purposes we can treat them as identical. In fact, the main differences are that tibbles are deliberately slightly harder to work with. They will give you more error messages where things aren't as tidy as they could be. That is to force you in to doing better cleaning or formatting of your data. However, as I say, the difference isn't something we need to worry about for this example. --- ## Viewing the data frame ### Option 1 <img class="center" src="data:image/png;base64,#img/session04/view-data-frame.PNG" width="90%"/> ??? I'm going to invite you to follow along here. You should have the script from before where you loaded the csv file - it should look something like the one on the slide. If you have library(readr) rather than library(tidyverse), it's probably a good idea to change that to tidyverse now. Remember the tidyverse contains both readr and ggplot. It's always a good idea to have a look at your data before plotting, so let's view the dataframe by clicking on it in the environment pane. --- ## Viewing the data frame This brings up a view of the data in a new tab: <img class="center" src="data:image/png;base64,#img/session04/view-capacity-ae.PNG" width="90%"/> ??? That should bring up the data in a new tab. --- ## Viewing the data frame Click here to show the data frame in a new window <img class="center" src="data:image/png;base64,#img/session04/open-view-window.PNG" width="90%"/> Useful when using multiple monitors ??? If you've got multiple monitors, the little arrrow and table icon will open it in a new window, which can be useful --- ## Viewing the data frame ### Option 2 Type the name of the dataset in editor/console, and run the line (shortcut <kbd> Ctrl + Enter</kbd>) <img class="center" src="data:image/png;base64,#img/session04/view-data-frame2.PNG" width="60%"/> ??? An alternate way of viewing the data, is to either put the name of the dataset in the editor and console, and run the line with control and enter. This should print it out to the console down here. You'll only get the first few lines of the data though. --- class: inverse, center, middle ## Q. Do we understand the variable names? _(and what they mean)_ -- <img class="center" src="data:image/png;base64,#img/session04/variable-names.PNG" width="60%"/> ??? So the first check we want to do with the dataset is check we understand what the variables are In this case, we have an ID column which indicates the site, an attendance column for 2018, a logical column saying whether staff numbers increased or not, then two columns, dcubicles showing the net change in number of cubicles between 2017 and 2018, and dwait, the change in average attendance times over each year --- ### "The simple graph has brought more information to the data analyst's mind than any other device" <img class="center" src="img/session04/ggplot2_exploratory.PNG" width="50%"> .footnote[John Tukey, quoted in [R for Data Science](https://r4ds.had.co.nz/data-visualisation.html)] ??? It's always good to do a few graphs when investigating things, because you can often spot things visually that are hidden in the data. Even if you're doing some additional modelling, it's always useful to do some exploratory work with graphs. --- # Q. Is a change in the number of cubicles available in A&E associated with a change in length of attendance? -- ### Let's explain the code We begin our plot with ggplot2 ```r ggplot() + ``` -- Inside ggplot() we can specify the dataset ```r ggplot(data = capacity_ae) ``` -- Next, we add layer(s) with + at the end ```r ggplot(data = capacity_ae) + geom_point(aes(x = dcubicles, y = dwait)) ``` ??? So one thing we might be interested in exploring is whether there is a relationship between changes in the number of cubicles and changes in the attendance times? So to explore this with a plot, we first start with the ggplot function, initially with nothing in the brackets. We add a 'plus' sign to indicate there will be more to come. Inside the brackets we put the dataset as an argument, so data = capacity_ae to tell ggplot where the dataset is and then to tell ggplot what sort of graph we want, we add a layer. In this case the layer is a geom_point() function. ggplot will recognise this as an added layer as we have the plus on the prior line. The geom_point() function tells ggplot we want a dot-plot or scatterplot. In the function, we have an aes function, which I'll explain now. --- class: center, middle # Choices There are choices about the chart to use but also the details of the chart <img class="center" src="data:image/png;base64,#img/session04/pie_charts.JPG" width="80%"/> ??? So to back up a bit, whenever we use a chart, we have a number of choices. These are about the type of chart, but also the details on it, how the data is represented, all the way down to colors and labels. --- class: center, middle # Choices </br> </br> 1. What shape will represent the data points? </br> </br> # .black[<svg viewBox="0 0 512 512" style="position:relative;display:inline-block;top:.1em;height:2em;" xmlns="http://www.w3.org/2000/svg"> <path d="M512 512H0V0h512v512z"></path></svg>] ??? Next slide --- class: center, middle # Choices </br> </br> 1. What shape will represent the data points? </br> </br> # .black[<svg viewBox="0 0 512 512" style="position:relative;display:inline-block;top:.1em;height:2em;" xmlns="http://www.w3.org/2000/svg"> <path d="M256 8C119 8 8 119 8 256s111 248 248 248 248-111 248-248S393 8 256 8z"></path></svg>] .pull-right[ .blue[**geom**]etric object] ??? The first question is what sort of shape will represent the data points - this is is a geom, short for geometric object. That's why our scatterplot is geom_point() - we're going to have each datapoint as a dot or point --- class: center, middle # Choices .darkgrey[ 1\. What shape will represent the data? .blue[geom]] 2\. What visual (.blue[**aes**]thetic attributes do we give to the geom?) # .black[<svg viewBox="0 0 512 512" style="position:relative;display:inline-block;top:.1em;height:2em;" xmlns="http://www.w3.org/2000/svg"> <path d="M256 8C119 8 8 119 8 256s111 248 248 248 248-111 248-248S393 8 256 8z"></path></svg>] ??? We also have to decide what the point looks like - these are aesthetic attributes, and include --- class: center, middle # Choices .darkgrey[ 1\. What shape will represent the data? .blue[geom]] 2\. What visual (.blue[**aes**]thetic attributes do we give to the geom?) # .black[<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M256 8C119 8 8 119 8 256s111 248 248 248 248-111 248-248S393 8 256 8z"></path></svg>] .pull-right[ ## .blue[size] ] ??? how big each point is.... --- class: center, middle # Choices .darkgrey[ 1\. What shape will represent the data? .blue[geom]] 2\. What visual (.blue[**aes**]thetic attributes do we give to the geom?) # <svg viewBox="0 0 512 512" style="position:relative;display:inline-block;top:.1em;height:2em;" xmlns="http://www.w3.org/2000/svg"> <path d="M377.941 169.941V216H134.059v-46.059c0-21.382-25.851-32.09-40.971-16.971L7.029 239.029c-9.373 9.373-9.373 24.568 0 33.941l86.059 86.059c15.119 15.119 40.971 4.411 40.971-16.971V296h243.882v46.059c0 21.382 25.851 32.09 40.971 16.971l86.059-86.059c9.373-9.373 9.373-24.568 0-33.941l-86.059-86.059c-15.119-15.12-40.971-4.412-40.971 16.97z"></path></svg> .black[<svg viewBox="0 0 512 512" style="position:relative;display:inline-block;top:.1em;height:2em;" xmlns="http://www.w3.org/2000/svg"> <path d="M256 8C119 8 8 119 8 256s111 248 248 248 248-111 248-248S393 8 256 8z"></path></svg>] .pull-right[ ## position (x axis) ] ??? Where it is on the graph and what that means.... --- class: center, middle # Choices .darkgrey[ 1\. What shape will represent the data? .blue[geom]] 2\. What visual (.blue[**aes**]thetic attributes do we give to the geom?) # <svg viewBox="0 0 256 512" style="position:relative;display:inline-block;top:.1em;height:2em;" xmlns="http://www.w3.org/2000/svg"> <path d="M214.059 377.941H168V134.059h46.059c21.382 0 32.09-25.851 16.971-40.971L144.971 7.029c-9.373-9.373-24.568-9.373-33.941 0L24.971 93.088c-15.119 15.119-4.411 40.971 16.971 40.971H88v243.882H41.941c-21.382 0-32.09 25.851-16.971 40.971l86.059 86.059c9.373 9.373 24.568 9.373 33.941 0l86.059-86.059c15.12-15.119 4.412-40.971-16.97-40.971z"></path></svg> .black[<svg viewBox="0 0 512 512" style="position:relative;display:inline-block;top:.1em;height:2em;" xmlns="http://www.w3.org/2000/svg"> <path d="M256 8C119 8 8 119 8 256s111 248 248 248 248-111 248-248S393 8 256 8z"></path></svg>] .pull-right[ ## position (y axis) ] ??? Next Slide --- class: center, middle # Choices .darkgrey[ 1\. What shape will represent the data? .blue[geom]] 2\. What visual (.blue[**aes**]thetic attributes do we give to the geom?) # <svg viewBox="0 0 512 512" style="position:relative;display:inline-block;top:.1em;fill:green;height:2em;" xmlns="http://www.w3.org/2000/svg"> <path d="M256 8C119 8 8 119 8 256s111 248 248 248 248-111 248-248S393 8 256 8z"></path></svg> .pull-right[ ## colour ] ??? What color it is, and whether that has any significance.... --- # A statistical graphic Shape/colour/size <svg viewBox="0 0 448 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M190.5 66.9l22.2-22.2c9.4-9.4 24.6-9.4 33.9 0L441 239c9.4 9.4 9.4 24.6 0 33.9L246.6 467.3c-9.4 9.4-24.6 9.4-33.9 0l-22.2-22.2c-9.5-9.5-9.3-25 .4-34.3L311.4 296H24c-13.3 0-24-10.7-24-24v-32c0-13.3 10.7-24 24-24h287.4L190.9 101.2c-9.8-9.3-10-24.8-.4-34.3z"></path></svg> geom all default ```r ggplot(data = capacity_ae) + * geom_point(aes(x = dcubicles, y = dwait)) ``` ??? So, when we specify geom_point(), to add a scatterplot layer, we also need to tell it our aesthetics, inside this aes functions. Here we are saying that the position on the x axis denotes the dcubicals value, the change in cubicles, and on the y axis we have dwait, or change in attendance time --- class: middle, center # Functions () ggplot(), geom_point(), and aes() are functions </br> </br> Running a function does something </br> </br> Functions are given zero or more inputs (arguments) </br> </br> Arguments of a function are separated by commas ??? Here I'm going to take a quick interlude to talk about functions. Ggplot syntax can look a bit complex because there's lots of individual functions being used and stitched together. You can think of R functions as being similar to excel functions, that you'd get in a spreadsheet. Running a function will give you an output, and what you get out will depend on what inputs or arguments that you put in. Where you specify more than one argument, you'll need to split them out with a comma, just like an excel function. --- class: center # Functions () </br> You can explicitly name arguments; </br> ```r ggplot(data = capacity_ae) + ``` </br> Or not: </br> ```r ggplot(capacity_ae) + ``` ??? Each function will have names for it's arguments. You can explicitly tell the function which input corresponds to each argument by including the name of the argument and setting it equal to the input, so here we're telling ggplot the data argument should be capacity_ae But you don't have to do this. If you don't specify the name of the argument, R will assume the ordering is telling it what the arguments are, so as data is the first argument for the ggplot function, it will assume the first thing in the brackets is the data argument. --- # Functions () </br> Other arguments like axes x and y are in a particular order; </br> ```r ggplot(data = capacity_ae) + * geom_point(aes(x = dcubicles, y = dwait)) ``` </br> It is possible to write it like: ```r ggplot(data = capacity_ae) + * geom_point(aes(y = dwait, x = dcubicles)) ``` But could be confusing. ??? If you do specify argument names, you can write them out of order. Here we've reversed the x and y arguments of aes, though doing so can get a bit confusing. --- # Functions () Here, we have provided **ggplot()** with one named argument .pull-right[ .darkgrey[ggplot(.blue[**data = capacity_ae**]) + </br> geom_point(.blue[**aes(x = dcubicles, y = dwait)**])] ] And given **aes()** two named arguments </br> </br> Unspecified (yet required) arguments will often revert to .green[default values] ??? So to recap what we've done here, we pass one data argument to ggplot, with the name of the dataset, and then two arguments to aes inside the geom_function. --- # Shorthand Since ggplot2 knows the order of essential arguments, it is not necessary to name arguments: .green[data = can be omitted </br>] and .green[x = goes first and y = goes second] ```r ggplot(capacity_ae) + * geom_point(aes(dcubicles, dwait)) ``` ??? If you like, you can also omit the names of the arguments, as long as you get them in the order the function expects, which for aes is x first, then y second. --- # geoms We tend to describe plots in terms of the geom used: </br> <img class="center" src="data:image/png;base64,#img/session04/geoms.PNG" width="80%"/> ??? As well as scatterplots, we can specify other types of graph by altering which geom function we use. We have geom_bar() for bar charts, geom_line() for line graphs, geom_boxplot, geom_histograms, lots of other geoms for different types of graph. --- # Layering geoms We can display more than one geom in a plot: .blue[<svg viewBox="0 0 448 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M416 208H272V64c0-17.67-14.33-32-32-32h-32c-17.67 0-32 14.33-32 32v144H32c-17.67 0-32 14.33-32 32v32c0 17.67 14.33 32 32 32h144v144c0 17.67 14.33 32 32 32h32c17.67 0 32-14.33 32-32V304h144c17.67 0 32-14.33 32-32v-32c0-17.67-14.33-32-32-32z"></path></svg>] to add a layer </br> ggplot(data = capacity_ae) .blue[<svg viewBox="0 0 448 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M416 208H272V64c0-17.67-14.33-32-32-32h-32c-17.67 0-32 14.33-32 32v144H32c-17.67 0-32 14.33-32 32v32c0 17.67 14.33 32 32 32h144v144c0 17.67 14.33 32 32 32h32c17.67 0 32-14.33 32-32V304h144c17.67 0 32-14.33 32-32v-32c0-17.67-14.33-32-32-32z"></path></svg>] </br> .blue[geom_point](aes(x = dcubicles, y = dwait)) .blue[<svg viewBox="0 0 448 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M416 208H272V64c0-17.67-14.33-32-32-32h-32c-17.67 0-32 14.33-32 32v144H32c-17.67 0-32 14.33-32 32v32c0 17.67 14.33 32 32 32h144v144c0 17.67 14.33 32 32 32h32c17.67 0 32-14.33 32-32V304h144c17.67 0 32-14.33 32-32v-32c0-17.67-14.33-32-32-32z"></path></svg>] </br> .blue[geom_smooth](aes(x = dcubicles, y = dwait)) </br> .blue[then specify another geom...] ??? You can also add more than one type of geom in the same plot. As with the first geom, we add these together with a plus sign So in this case, we could add a geom_smooth geom, which adds a smoothed line to the plot. We specify the same aesthetics --- # Your turn This is our current plot: ```r ggplot(data = capacity_ae) + geom_point(aes(x = dcubicles, y = dwait)) ``` </br> Add a geom_smooth layer (to help identify patterns) </br> Hint: Don't forget the .blue[+] and aes() values in the new layer ??? So if you're following along, do you want to have a try at recreating that additional geom_smooth() line - without it on the screen in front of you. I'll just give you a few seconds to have a go then I'll reveal the answer again. Remember that you'll need a plus sign after geom_point to join on the additional layer, and within the geom, you'll have to include an aes function. <pause for a couple of mins> --- # Your turn - Answer ```r ggplot(data = capacity_ae) + geom_point(aes(x = dcubicles, y = dwait)) + geom_smooth(aes(x = dcubicles, y = dwait)) ``` ??? Ok, so this is what you should have had. Hopefully those who tried it were close or know where they went wrong. Another mistake that even I make from time to time is to forget that you need two brackets at the end of the geom function, to close off both the aes and geom functions. It's really easy to forget the second one --- <!-- --> ??? Ok, so if you got that, you should be able to see this graph, The line we've plotted is a smoothed line showing that higher additional cubicles was accompanied by higher reductions in the attendance time - so having more cubicles seems to mean people are seen faster. --- # One more thing We'd probably prefer a linear fit rather than a non linear fit: ```r ggplot(data = capacity_ae) + geom_point(aes(x = dcubicles, y = dwait)) + geom_smooth(aes(x = dcubicles, y = dwait), * method = "lm") ``` ??? We might also want to have a straight line rather than a smoothed average, so we can change this by adding a method = "lm" argument --- <!-- --> ??? And then we get something like this.... Is there anything that jumps out here? Well, one thing might be that we have two outliers at the bottom here, associated with reductions in attendance times, but without large changes in cubicle numbers --- # What is happening here? <!-- --> ??? You can see that more clearly with this representation here, where the two points have a slighly different hue than the others, making them stand out. Using colours can be a useful means of drawing attention or differentiating between data points. --- class: center, middle # Hypothesis </br> The two sites have seen staffing increases </br> We can map point .blue[colour] (aesthetic attribute) to the staff_increase variable to find out </br> We will add colour to the chart depending on the value of staff_increase (TRUE or FALSE, 1 or 0) ??? Ok, so maybe the reason for the reduction in wait times is something to do with staffing. We have a variable staff_increase which says whether staff numbers have gone up, so to investigate this we'll represent this staff_increase variable by changing the colours of the geoms according to whether this is TRUE or FALSE --- # Adding another dimension Put an argument **inside** aes() if you want a visual attribute to change with different values of a variable. ```r ggplot(data = capacity_ae) + geom_point(aes(x = dcubicles, y = dwait, * colour = staff_increase)) + geom_smooth(aes(x = dcubicles, y = dwait), method = "lm") ``` </br> We could have equally have chosen size or shape but these make graphic less clear ??? To change colour, we're going to add an aesthetic inside the aes function, and we want it to be the colour argument, and we want the colour to be determined by the staff_increase variable. We could have used size or shape but generally colour changes are easier to see and frankly make it look better --- # What is happening here? - Answer The two sites have indeed seen an increase in staff levels which has had an effect on the dwait even though dcubicles are relatively low. <img src="data:image/png;base64,#04-workshop_ggplot2_files/figure-html/unnamed-chunk-14-1.png" width="50%" /> ??? So when you do that, you'll see that those two outlying points did have an increase in staff, possibly explaining where the fall in attendance time was coming from. --- # Important distinction If you want a visual attribute to be applied across the whole plot, the argument goes **outside** aes(): ```r ggplot(data = capacity_ae) + * geom_point(aes(x = dcubicles, y = dwait), colour = "red") + geom_smooth(aes(x = dcubicles, y = dwait), method = "lm") ``` </br> This works too because the colour is generically applied: ```r ggplot(data = capacity_ae) + * geom_point(aes(x = dcubicles, y = dwait, colour = "red")) + geom_smooth(aes(x = dcubicles, y = dwait), method = "lm") ``` ??? Now, you might want to make all the points a certain colour, rather than have them determined by a variable. So if you just wanted all your points red, you can specify a colour, but you put the attribute *outside* of the aes function --- # Important distinction Or apply a size globally: ```r ggplot(data = capacity_ae) + * geom_point(aes(x = dcubicles, y = dwait), * size = 4) + geom_smooth(aes(x = dcubicles, y = dwait), method = "lm") ``` ??? Similarly you might want to apply size of the dots globally, outside aes() --- # Layering geoms To avoid duplication, we can pass the common local aes() arguments to ggplot to make them global. Instead of duplicating the same aes(dcubicles, dwait): ```r ggplot(data = capacity_ae) + * geom_point(aes(dcubicles, dwait)) + * geom_smooth(aes(dcubicles, dwait)) ``` </br> Move the aes to the "global": ```r ggplot(data = capacity_ae, aes(dcubicles, dwait)) + * geom_point() + * geom_smooth() ``` ??? Sometimes you might end up specifying the same aesthetics in each geom, if you're using more than one. In this case, you can move the aes function up to the main ggplot function, where it will be applied to all subsequent geoms. That can make things look a bit tidier. --- # Small multiples magic Another way to visualise the relationship between multiple variables is with a facet_wrap() layer: ```r ggplot(data = capacity_ae) + geom_point(aes(x = dcubicles, y = dwait)) + * facet_wrap(~ staff_increase) ``` ??? Another method of differentiating or adding another dimension to a graph is to use faceting, or "small multiples". This prints a version of the plot for each category in the faceting variable. So here it's the staff increase variable. The tilda before staff_increase is saying to put the versions in a row. --- # Small multiples Another way to visualise the relationship between multiple variables is with a facet_wrap() layer: ```r ggplot(data = capacity_ae) + geom_point(aes(x = dcubicles, y = dwait)) + facet_wrap(~ staff_increase, * ncol = 1) ``` ??? We can change the configuration of the individual graphs using the ncol argument, for the number of columns. --- class: center, middle # Demonstrating geom charts (note: these are simple, </br> unpolished graphics) ??? Lastly, I'll just run through some simple types of graph, not especially polished, just to show you alternate geoms. --- # Q. How are "wait" values distributed? ## Histogram ```r ggplot(data = capacity_ae) + geom_histogram(aes(dwait)) ``` ??? A histogram showing wait times [Demo] --- # Q. How are “wait” values distributed? ## Histogram ```r ggplot(data = capacity_ae) + geom_histogram(aes(dwait), binwidth = 10) ``` </br> With "bins" set so more uniformed in spread: ??? The same histogram, with more sensible bins --- # Q. Number of attendances by site? ## Bar plot ```r ggplot(data = capacity_ae) + geom_col(aes(x = site, y = attendance2018)) ``` -- </br> Reorder site __by__ attendances ```r ggplot(data = capacity_ae) + * geom_col(aes(x = reorder(site, attendance2018), y = attendance2018)) ``` ??? A bar (or column plot) A bar plot reordered by attendances - it can be nice to show bar graphs ordered so people can immediately see the relative size of the variable. This is done by the use of the 'reorder' function within the aesthetic specification --- # Q. Number of attendances by site? ## Boxplot ```r ggplot(data = capacity_ae) + geom_boxplot(aes(staff_increase, dwait)) ``` ??? Here's a boxplot showing change in attendance time by whether staff increased or not -- ### Plot labels Can be applied to all types of charts: ```r ggplot(data = capacity_ae) + geom_boxplot(aes(staff_increase, dwait)) + * labs(title = "Do changes in staffing...", * y = "Waiting") ``` ??? Here's a boxplot showing change in attendance time by whether staff increased or not And the same with added labels, using the labs function --- # To save a plot ```r ggplot(data = capacity_ae) + geom_point(aes(x = dcubicles, y = dwait)) + geom_smooth(aes(x = dcubicles, y = dwait), method = "lm") + * ggsave("plot_name.png") ``` ??? And the last thing to cover is how to save a plot to file, if you want to retrieve it or paste it into a report. To do this, you use the ggsave function, passing as an argument what you want the filename to be called. --- # To save a plot ```r ggplot(data = capacity_ae) + geom_point(aes(x = dcubicles, y = dwait)) + geom_smooth(aes(x = dcubicles, y = dwait), method = "lm") + * ggsave("plot_name.png", units = "cm", * height = 10, width = 8) ``` </br> By default saves a plot in the same dimensions as plot window. In future, you'll wish to add height, width and "units" arguments to specify plot dimensions. ??? you can also pass other arguments to ggsave to specify how big you want the plot - otherwise it defaults to the same size you see in the plot window. Ok, that's the end of ggplot - it's a big topic so thanks for bearing with me. Do we have any questions before the final quick pre-lunch topic --- #### This work is licensed as </br> Creative Commons </br> Attribution </br> ShareAlike 4.0 </br> International </br> To view a copy of this license, visit </br> https://creativecommons.org/licenses/by/4.0/